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An American option grants the holder the right to select the 
time at which to exercise the option, so pricing an American option 
entails solving an optimal stopping problem. Difficulties in apply¬ 
ing standard numerical methods to complex pricing problems have 
motivated the development of techniques that combine Monte Carlo 
simulation with dynamic programming. One class of methods approx¬ 
imates the option value at each time using a linear combination of 
basis functions, and combines Monte Carlo with backward induction 
to estimate optimal coefficients in each approximation. We analyze 
the convergence of such a method as both the number of basis func¬ 
tions and the number of simulated paths increase. We get explicit 
results when the basis functions are polynomials and the underlying 
process is either Brownian motion or geometric Brownian motion. We 
show that the number of paths required for worst-case convergence 
grows exponentially In the degree of the approximating polynomials 
in the case of Brownian motion and faster in the case of geometric 
Brownian motion. 

1. Introduction. An American option grants the holder the right to se¬ 
lect the time at which to exercise the option, and in this differs from a 
European option which may be exercised only at a fixed date. A standard 
result in the theory of contingent claims states that the equilibrium price 
of an American option is its value under an optimal exercise policy (see, 
e.g., Chapter 8 of [6]). Pricing an American option thus entails solving an 
optimal stopping problem, typically with a finite horizon. 

Solving this optimal stopping problem and pricing an American option 
are relatively straightforward in low dimensions. Assuming a Markovian for¬ 
mulation of the problem, the relevant dimension is the dimension of the state 
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vector, and this is ordinarily at least as large as the number of underlying 
assets on which the payoff of the option depends. In up to about three di¬ 
mensions, the problem can be solved using a variety of numerical methods, 
including binomial lattices, finite-difference methods and techniques based 
on variational inequalities. (See, e.g.. Chapter 5 of [10] or Chapter 9 of [19] 
for an introduction to these methods.) But many problems arising in prac¬ 
tice have much higher dimensions, and these applications have motivated the 
development of Monte Carlo methods for pricing American options. The op¬ 
timal stopping problem embedded in the valuation of an American option 
makes this an unconventional and challenging problem for Monte Carlo. 

One class of techniques, based primarily on proposals of Carriere [4], 
Longstaff and Schwartz [11] and Tsitsiklis and Van Roy [17, 18], provides 
approximate solutions by combining simulation, regression and a dynamic 
programming formulation of the problem. Related methods have been used 
to solve dynamic programming problems in other contexts; Bertsekas and 
Tsitsiklis [1] discuss several techniques and applications. In this approach 
to American option pricing, the value function describing the option price 
at each time as a function of the underlying state is approximated by a 
linear combination of basis functions; the coefficients in this representation 
are estimated by applying regression to the simulated paths. Such an ap¬ 
proximation is computed at each step in a dynamic programming procedure 
that starts with the option value at expiration and works backward to find 
the value at the current time. Any such method clearly restricts the number 
of possible exercise dates to be hnite; these dates may be specified in the 
terms of the option, or they may serve as a discrete-time approximation for 
a continuously-exercisable option. 

The convergence results available to date for these methods are based on 
letting the number of simulated paths increase while holding the number of 
basis functions fixed. Tsitsiklis and Van Roy [18] prove such a result for their 
method and Clement, Lamberton and Protter [5] do this for the method of 
Longstaff and Schwartz [11]. (The two methods differ in the backward in¬ 
duction procedure they use to solve the dynamic programming problem.) 
The convergence established by these results is therefore convergence to the 
approximation that would be obtained if the calculations could be carried 
out exactly, without the sampling error associated with Monte Carlo. Con¬ 
vergence to the correct option price requires a separate passage to the limit 
in which the number of basis functions increases. 

This paper considers settings in which the number of paths and number of 
basis functions increase together. Our objective is to determine how quickly 
the number of paths must grow with the number of basis functions to en¬ 
sure convergence to the correct value. The growth required turns out to be 
surprisingly fast in the settings we analyze. We take the underlying process 
to be Brownian motion or geometric Brownian motion and regress against 
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polynomials in each case. We examine conditions for convergence to hold 
uniformly over coefficient vectors having a fixed norm, and in this sense our 
results provide a type of worst-case analysis. We show that for Brownian 
motion, the number of polynomials K = Kj\f for which accurate estimation 
is possible from N paths is O(logA^); for geometric Brownian motion it is 
0(\/log ). Thus, the number of paths must grow exponentially with the 

number of polynomials in the first case, faster in the second case. 

Focusing on simple models allows us to give rather precise results. Our 
most explicit results apply to one-dimensional problems that do not require 
Monte Carlo methods, but we believe they are, nevertheless, relevant to 
higher-dimensional problems. Many high-dimensional interest rate models 
have dynamics that are nearly Brownian or nearly log-Brownian; see, for 
example, the widely used models in Chapters 14 and 15 of [13]. Our focus 
on polynomials helps make our results explicit and is also consistent with, 
for example, examples in [11] and remarks in [5]. Our analysis relies on 
asymptotics of moments of the functions used in the regressions. To the 
extent that similar asymptotics could be derived for other basis functions 
and underlying distributions, our approach could be used in other settings. 

We prove two types of results, providing upper and lower bounds on K 
and thus corresponding to negative and positive results, respectively. For 
an upper bound on K, it suffices to exhibit a problem for which conver¬ 
gence fails. For this part of the analysis we therefore consider a single-period 
problem—a single regression and a single step in the backward induction. 
The fact that an exponentially growing sample size is necessary even in a 
one-dimensional, single-period problem makes the result all the more com¬ 
pelling. For the positive results we consider an arbitrary but fixed number 
of steps, corresponding to a finite set of exercise opportunities. We prove 
a general error bound that relies on few assumptions about the underly¬ 
ing Markov process or basis functions, and then specialize to the case of 
polynomials with Brownian motion and geometric Brownian motion. 

Section 2 formulates the American option pricing problem, discusses ap¬ 
proximate dynamic programming and presents the algorithm we analyze. 
Section 3 undertakes the single-period analysis, first in a normal setting 
then in a lognormal setting. Section 4 presents results for the multiperiod 
case. Proofs of some of the results in Sections 3 and 4 are deferred to Sections 
5 and 6, respectively. 

2. Problem formulation. In this section we first give a general descrip¬ 
tion of the American option pricing problem, then discuss approximate dy¬ 
namic programming procedures and then detail the algorithm we analyze. 

2.1. The optimal stopping problem. A general class of American option 
pricing problems can be formulated through an Ji'^-valued Markov process 
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{S'(t),0 <t< T}, [with 5'(0) fixed], that records all relevant financial infor¬ 
mation, including the prices of underlying assets. We restrict attention to op¬ 
tions admitting a finite set of exercise opportunities 0 = to < < ^2 < • • • < 

tm < T, sometimes called Bermudan options. (We preserve the continuous¬ 
time specification of S because the lengths of the intervals tj+i — t* appear 
in some of our results.) If exercised at time tn, n = 0,1,... ,m, the option 
pays hn{S{tn)), for some known functions ho,hi, ... ,hm mapping into 
[0,oo). Let Tn denote the set of stopping times (with respect to the history 
of S) taking values in {tn, tn+i, ■ ■ ■ ,tm} and define 

(1) V*{x) = sup E[hr{S{T))\S{tn) =x], X € , 

reTn 

for n = 0,1,... ,m. Then V*{x) is the value of the option at tn in state x, 
given that the option was not exercised prior to tn- For simplicity, we have 
not included explicit discounting in (1). Deterministic discounting can be ab¬ 
sorbed into the definition of the functions hn, and stochastic discounting can 
usually be accommodated in this formulation at the expense of increasing 
the dimension of S. 

The option values satisfy the dynamic programming equations 

(2) F,;(x) = /i™(x), 

(3) Vnix) = max{hn{x), E[I4+i(5(tn-hi))|5'(in) = a;]}, 

n = 0, l,...,m — 1. These can be rewritten in terms of continuation values 
Cnix) = E[I4%i(S'(t„+i))|5’(tn) = a:], n = 0,1,..., m - 1, 


as 

(4) C*„,{x)=0, 

(5) Cn{x) = E[ma:ic{hn+l{S{tn+l)),Cn+iiS{tn+l))}\S{tn) =x], 
n = 0, l,...,m — 1. The option values satisfy 

Vn{x) = max{hn{x),Cn{x)}, 
so these can be calculated from the continuation values. 

2.2. Approximate dynamic programming. Exact calculation of (2)-(3) or 

(4)-(5) is often impractical, and even estimation by Monte Carlo is chal¬ 
lenging because of the difficulty of estimating the conditional expectations 
in these equations. Approximate dynamic programming procedures replace 
these conditional expectations with linear combinations of known functions, 
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sometimes called “features” but more commonly referred to as basis func¬ 
tions. Thus, for each n = 1,... ,m, let ipnk^ k = Q,... ,K^ be functions from 
to JR and consider approximations of the form 

K 

Cnix) 

k=0 

for some constants (ink-, or the corresponding approximation for V*. Working 
with approximations of this type reduces the problem of finding the functions 
C* to one of finding the coefficients The methods of Longstaff and 
Schwartz [11] and Tsitsiklis and Van Roy [17, 18] select coefficients through 
least-squares projection onto the span of the basis fnnctions. Other methods 
applying Monte Carlo to solve (2)-(3) include Broadie and Glasserman [2, 3], 
Haugh and Kogan [9] and Rogers [16]; for an overview, see Glasserman [8]. 

To simplify notation, we write Sn for S{tn)- We write ijjn for the vector of 
functions (V^nO) • • • j V^nx)"'"- The following basic assumptions will be in force 
throughont; 

(AO) V'nO = 1 for n = 1,...,m; E[ipniSn)] = 0, for n = 1,... ,m; and 

^n = E[il;n{Sn)MSnV] 
is finite and nonsingnlar, n = 1,..., m. 

For any sqnare-integrable random variable Y define the projection 
UnY = i;^{Sn)^-^E[YMSn)]- 


Thus, 

( 6 ) 

with 

(7) 

We also write 


K 


RnV — 'y ] Oik'^nki^Sn 
k=0 


{ao,...,aKV = ^-^E[YMSn)]. 


-It 


K 


(n„K)(3:) = akipnkix) 
k=0 

for the function defined by the coefficients (7). 

Dehne an approximation to (4)-(5) as follows: Cm{x) = 0, 


( 8 ) 


C'n(x) — (IIj^ max{/ifi-|_i ((Sfi-i-i), ((Sfi-i-i)}) (x). 
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As in (6), the application of the projection n„ results in a linear combination 
of the basis functions, so 

K 

(9) Cn(x) = (n„max{/i„+i,Cn+i}) (x) = I3nki’nk{x) 

k=0 

with = (/3n0) ■ • ■ ^Puk) defined as in (7) but with Y replaced by 

(10) (5*fj-|_i) = max{/i^^i , Cn,-(-i (5*fj-|_i)}. 

With the payoff functions hn fixed, we can rewrite (9) using the operator 

(11) LfiCn+l — fin, Cfi-|_i}). 

Exact calculation of the projection in (8) is usually infeasible, but it is rel¬ 
atively easy to evaluate a sample counterpart of this recursion defined from 
a hnite set of simulated paths of the process S. We consider the following 
procedure to approximate the coefficient vectors f5n and the continuation 
values Cn- 

Step 1. Set Cm = 0 and Vm = max{/im, Cm} = hm- 

Step 2. For each n = 1,..., m — 1, repeat the following steps: Generate 
N paths ..., i = 1,..., A^, up to time tn+i, independent of each 

other and of all previously generated paths. Calculate 

1 ^ 

7n = ^E^-+l(>5i+l)V’n(5®), 

i=l 

calculate the coefficients Pn = set 

(12) Cffi — Ipfi — ^^n+l}; 

(13) . 

step 3. Set Co(5o) = ^(<5^) and Co(5o) = max{ho(5o), ^ 0 (^ 0 )}. 

A few aspects of this algorithm require comment. In Step 3 we simply 
average the estimated values at ti to get the continuation value at time 0 
because 5(0) is hxed. The operators Ln and !!„ implicitly defined in (12) are 
the sample counterparts of those in (6) and (11), using estimated rather than 
exact coefficients. The coefficient estimates in Step 2 use the matrices In 
ordinary least-squares regression, each would be replaced with its sample 
counterpart, 

1 ^ 
i=l 


AMERICAN OPTION PRICING 


7 


calculated from the simulated values themselves. (Owen [14] calls the use 
of the exact matrix ^ttasi-regression.) In our examples, the are indeed 
available explicitly and using this formulation simplifies the analysis. 

In Step 2 we have used an independent set of paths to estimate coeffi¬ 
cients at each date, though the algorithms of Longstaff and Schwartz [11] 
and Tsitsiklis and Van Roy [17, 18] use a single set of paths for all dates. 
This modification is theoretically convenient because it makes the coeffi¬ 
cients of Cn+i independent of the points at which Cn+i is evaluated in the 
calculation of 7 „. This distinction is relevant only to the multiperiod anal¬ 
ysis of Section 4 and disappears in the single-period analysis of Section 3. 
The worst case over all multiperiod problems is at least as bad as the worst 
single-period problem. The results in Section 3 thus provide lower bounds 
on the worst-case convergence rate for multiperiod problems whether one 
uses independent paths at each date or a single set of paths for all dates. 

3. Single-period problem. For the single-period problem, we fix dates 
ti < t 2 and consider the estimation of coefficients /3o, • ■ •, Pk in the projection 
of a function of S 2 onto the span of V’ifc('S'i), A: = 0 ,..., iF. Thus, 

(14) /3 = (/3o,...,/3^)T = ^-i^ 

with 41 = 4'i and 7 = E[y'i/;i(S'i)] for some Y. In a simplified instance of 
the algorithm of the previous section, we simulate N independent copies 
(S'|*\ i = 1 ,..., V, and compute the estimate 

(15) /3 = 'k-i7, 

where 7 is the unbiased estimator of 7 with components 
1 ^ 

(16) E fc = 0 , 1 ,..., R. 

2 = 1 

We analyze the convergence of (3 (and 7 ) as both N and K increase. 

We denote by |x| the Euclidean norm of the vector x. For a matrix A, we 
denote by ||^|| the Euclidean matrix norm, meaning the square root of the 
sum of squared elements of A. It follows that \Ax\ < ||^|| |a:| and then from 
(14) and (15), 

(17) < I/3-/3I < ||^■^|| l7-7l- 

The Euclidean norm on vectors is a measure of the proximity of the func¬ 
tions determined by vectors of coefficients. To make this more explicit, let b 
and c be coefficient vectors and let Sn have density gn- Then 
(K K \2 

/ X! ^k'tpnkix) - ^ Ck'ljjnkix) j gn{x) dx = {b - c)~^^n{b - c) 

•1 \ fc =0 fc =0 / 
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and 

1 ^ - cp <{b- c)'^^n{b - c) < ll^'nll \b - cp. 

Il'^n II 

Thus, the Euclidean norm on vectors gives the norm (with respect to 
Qn) for the functions determined by the vectors, up to factors of ||'I'„|| and 
||T“^|| that will prove to be negligible in the settings we consider. 

We therefore investigate the convergence of the expected squared differ¬ 
ence E[|/9 — /3p]. Because this is the mean square error of /3, we also denote 
it by MSE(;5). Thus, (17) implies 

( 18 ) 

For a given number of replications N and basis functions K, MSE(/3) can 
be made arbitrarily large or small by multiplying /3 by a constant. To get 
meaningful results, we therefore adopt the following normalization: 

(Al) \p\ = l. 

We investigate the convergence of the supremum of the MSE(/3) over all /3 
satisfying this condition. In order to investigate how N must grow with K, 
we assume that the regression representation is, in fact, valid, in a sense 
implied by the following two conditions: 

(A2) Y has the form 

K 

T = ^ ak'4’2kiS2), 
k=0 

for some constants Ofc. 


(A3) There exist functions fk ■ 5R+ —> 5R+, k = 0,..., K, such that 

E[/fe(t2)V'2fc(5'2)|5'i] = /fe(tl)V’lfc(5'l), t2 > ti- 

Condition (A3) states that the ipnkiSn) are martingales, up to a deter¬ 
ministic function of time. Condition (A2), though a strong assumption in 
practice, makes Theorems 1 and 2 more compelling: the rapid growth in the 
number of paths implied by the theorems holds even though we have chosen 
the “correct” basis functions, in the sense of (A2). The results of Section 4 
give sufficient conditions for convergence without such an assumption. 

Under assumptions (A2) and (A3), we have 


(19) 


7fc = E[yV’ifc(5i)] 


= E 


K 


X]“;V’2z(5'2)V’ifc('S'i) 

. 1=0 
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K 

1=0 


flih) 

Mt2) 


E[V'R(5i)V’lfc(5l)]. 


The restriction on f3 in (Al) then restricts a. 

Returning to the analysis of MSE(/3), (18) indicates that we need to ana¬ 
lyze the mean square error of 7 , for which (since £[ 7 ] = 7 ) we get 

(20) E[|7-7|2] = ^Var[7fc] 

k=^ 


( 21 ) 

k=0 k=0 

Thus, using (18), (A2) and the Cauchy-Schwarz inequality, 
MSE(/3) < E[| 7 - 7 p] 

(22) E^E[y2^f,(5i)] 


<11^ E E[V’ij(-52)V'ifc(5i)]- 

' 1=0 k,j =0 

To get a lower bound, we may define Y* = a*j^'ip 2 K{S 2 ), with a|^ chosen 
such that the corresponding f3* satisfies |/3*| = 1. Using (18) and (20), we 
then get 

,r,^ Y. li) 

(23) ^ K 

~ ||»I/||2 U - -Tj ^li) ■ 

II ^11 \ k=0 k=0 / 

From (22) and (23) we see that the key to the analysis of the uniform 
convergence of MSE(/3) lies in the growth of fourth-order moments of the 
form E['0|^(5’2)V'ifc (5i)]. This, in turn, depends on the choice of basis func¬ 
tions and on the law of the underlying process S. We analyze the case of 
polynomials with Brownian motion and geometric Brownian motion. 


3.1. Normal setting. For this section, let {S'(t),0 < t < T} be a stan¬ 
dard Brownian motion. We define the basis functions through the Hermite 
polynomials 


Ln/2J 

HeAx)= E 


i=0 


(—l)®n!x"' 
(n- 2 i)!i! 2 * ’ 


re = 0,1,..., 
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where [n/2j denotes the integer part of n/2. The Hermite polynomials have 
the following useful properties: They are orthogonal with respect to the 
standard normal density cf), in the sense that = 1 and 

0, f / J, 


Hei {x)He. ix)(j){x) dx = 


l\, 


l = J. 


They define martingales, in the sense that (see, e.g., [15], page 151) 






S{t 


for t 2 >ti. And their squares admit the expansion 




(24) 


^e2i (^) 


iHeJx)y = {n\fJ2T\\2( qr 


The functions 

(25) i)rik{x) = ^H^^{x 

satisfy (A3) with fk{t) = They are also orthogonal and their 4' matrix 
is the identity. Thus, /? = 7 and /? = 7 . 

We can now state the main result of this section. Let p = and for 

p > 1 define 



Cp = 21 og (2 + ^). 


Table 1 

Estimates 0 /MSE(d) for various combinations of K basis functions and N paths. The 
critical values K = log N/cp are displayed by in the bottom row and also indicated by the 

horizontal line through the table 


N 


K 

500 

1000 

2000 

4000 

8000 

16000 

32000 

64000 

128000 

1 

0.01 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

2 

0.08 

0.04 

0.02 

0.01 

0.00 

0.00 

0.00 

0.00 

0.00 

3 

0.67 1 

0.31 

0.17 

0.08 

0.04 

0.02 

0.01 

0.00 

0.00 

4 

5.6 

3.0 

1.6 

0.73 

1 0.36 

0.18 

0.09 

0.05 

0.02 

5 

52.7 

23.4 

13.5 

6.0 

3.1 

1.5 

0.8 

0.40 1 

0.20 

6 

427.2 

155.7 

93.3 

38.4 

24.0 

10.8 

6.2 

3.1 

1.5 

7 

2403 

1202 

600.8 

300.4 

150.2 

75.1 

37.5 

18.8 

9.4 

8 

11447 

5723 

2862 

1431 

715.4 

357.7 

178.9 

89.4 

44.7 

9 



9856 

4928 

2464 

1232 

616 

308 

154 

10 





6109 

3054 

1527 

764 

381 

11 







2810 

1405 

702 

12 









1023 

Bound 

2.5 

2.8 

3.1 

3.4 

3.7 

3.9 

4.2 

4.5 

4.8 
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Theorem 1. Let ij^nk be as in (25) and suppose (A2) holds. If K = 

(1 — (5) X logN/cp for some 6 > 0, then 

(26) lim sup MSE(;5) = 0. 

If K = {1 + 5)\ogN/cp for some <5 > 0, then 

(27) lim sup MSE(/3) = oo. 

This result shows rather precisely that, from a sample size of N, the 
highest K for which coefficients of polynomials of order K can be estimated 
nniformly well is OilogN). Equivalently, the sample size required to achieve 
convergence grows exponentially in K. 

This is illustrated numerically in Table 1, which shows estimates of MSE(/3) 
for various combinations of N and K. The results shown are for Y = y/tf,) f 

with = 1 and t 2 = 2, a special case of the Y we use to prove (27). The 
estimates are computed as follows. For each entry of the table, we gener¬ 
ate 5000 batches, each consisting of N paths. From each batch we compute 
/3 and then take the average of |/9 — /3p over the 5000 batches. This aver¬ 
age provides our estimate of MSE(/3) in each case with K <6. For K >7 
this produced unacceptably high variability, so for those cases we calcnlated 
MSE(;5) from 5000 replications of = 500,000 and then scaled the estimate 
by N. 

The bottom row of the table displays the critical values K = log N/cp 
provided by Theorem 1; these values are also indicated by the horizontal 
line through the table. As indicated by the theorem, MSE(/3) explodes along 
any diagonal line through the table steeper than the critical line, and remains 
small above the critical line. 

The proof of the theorem uses the following two lemmas, proved in Sec¬ 
tion 5. 

Lemma 1. For the ipnk (25) and p = t 2 /ti, 

(28) E[^2fc, {S2)^Pik, (5i)] = { 

(29) E[i;2kAS2fi’ikASi?] = , 

with ki Ak 2 the minimum of ki and ^ 2 - Equation (29) is strictly increasing 
in ki and k 2 . 
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For the special case ki = k 2 = K, (29) yields 

(30) E[,p2K{S2fMSif] = J2p~"{^k) (T)'- 

As a step toward bounding this expression, let k* denote the index of the 
largest summand so that 

2 




2k^ 

k* 


(31) p 

For k*, we have the following lemma. 
Lemma 2. As K ^ oo, 

9 

k* = 


= max p 
0<k<K 


-k 


2k\ fK 


k 


k 


Kil + o{l)). 


2 + y/p 

Proof of Theorem 1 . We bound MSE(/3) from above based on (22). 
Combining the fact that /3 = 7 (because ^ = I) with (19) and (28) we get 


K 


Pk = '^aiE[ilj2i{S2)'ilJikiSi)] =akp 


— n n -^/2 


z=o 

.2 / Jt 


Thus, |/3| = 1 implies < p^. From (30) and (31), we get 

2 


( I* j UW < ^[i’2K{S2f'^lK{Slf] 


(32) 


2¥ 


K 


< {K + l)p 


—k* 


2k* 

k* 


Recalling that T = / and applying the inequality a| < p^ to (22) we get 

, K K 

sup MSE(/3) < sup Yu E[V’|( 52 )V'ifc(S'i)] 


l/3|=l 


=1 


1 


N 

1=0 k,l=0 

K K 


(33) 


(34) 


+ E ^[^lk{S2)^li{Si)] 

k=0 k,l=0 

< + 1)2e[V.7(S2)V’?,{(5i)1 


N 


< 


{K + 1)^ 


N 


-P 


2k* 

k* 


where (33) follows from Lemma 1 and the last inequality follows from (32). 
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To get a lower bound on the supremum of MSE(/3) we use (23) with 


Y* = p^/^^2K{S2)^a*K^2K{S2): 


for which (5k = 1 and /3fc = 0, k ^ K. By applying Lemma 1 and the lower 
bound in (32), (23) becomes 


supMSE(,3)>^l| 


K 


(35) 


> 


> 


N 

1 1 

K + IN 


EEfe(-52)V'?fc(Si)]-l 

^[^Ik{S2)^Ik{Si)] 


K-k* 


2k* 

k* 


By Stirling’s approximation n! ~ \/2mr{^)"' and Lemma 2 we get 

fK\_ K\ 

\k*) ~ k*\{K-k*)\ 


\/2k*n^j2{K — k*)7r{k*/e)^* {K — k*/e)^ ^*(l + o(l)) 

with a = 2/(2 + y^) and 6 = 1 — a. Also, 


(37) 



2k* I 
k*\k*\ 


\J 4/c*7r(2fc*/e)^^* (1 + o(l)) 
2k*TT{k*/eY^* (1 + o(l)) 

(1 + 0 ( 1 )). 


Y aKir 


By substituting (36) and (37) into (34) and (35) we get 


2N{K + 1) V^^o6A:7ra2aA526A 

< sup MSE(/3) 
l/3|=l 

(K + 1')^o^'^22“^ 

< --(l + o(l)). 

2NY aKuabK 'Ka?°'^ 526 A 
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Simple algebra verifies that Cp = 2 alog( 2 ) — 2alog(o) — 261og(6) + blog{p), 
so we can rewrite these bounds as 

2N{K + l)V^K^abK7r^^ + o(l)) < MSE(/3) 

<^^±£^(1 + 0 ( 1 )). 

2NV^^abKTT 

li K = [1 — 5) \ogN/cp for some 5 > 0, then as N ^ oo, 

SO (26) holds. If = (1 + (5) log N/cp for some <5 > 0, then as N ^ oo, 

2N(K + l)^abK^ ^^ + 0(1))} ^ ilog V + o(log JV) ^ oo, 
and (27) holds. □ 


3.2. Lognormal setting. We now take S to be geometric Brownian mo¬ 
tion, S{t) = exp(W(t) — t/2), with W a standard Brownian motion. For 
the basis functions ijjnk ^'4’k^ we use multiples of the powers to get the 
martingales 

(38) = 

These functions satisfy (AO). The main result of this section is the following: 


Theorem 2. Let the ipk be as in (38) and suppose (A2) holds. If 


for some d > 0 , then 


K = 


(1 — 6) log 
5ti -I -12 


lim sup MSE(,5) = 0. 


If 


K = 


(1 -Ed) logA^ 

3ti -|- t2 


for some 6 > 0, then 


lim sup MSE(/3) = oo. 
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Compared with the normal case in Theorem 1, we see that here K must 
be much smaller—of the order of ^/logW. Accordingly, N must be much 
larger—of the order of exp(A'^). The analysis in this setting is somewhat 
more complicated than in the normal case because the V’fc are no longer 
orthogonal. To prove the theorem we state some lemmas that are proved in 
Section 5. 


Lemma 3. For t 2 > ti and ki,k 2 = 0,..., K, 

and E[tpi^^{Si)‘^'tpk 2 {S 2 )‘^] is strictly increasing in ki and k 2 . 


Using the first statement in the lemma, we Hnd that the matrix 'l'(t) with 
ijth entry E['!/;j_i(5(t))'0j-i(5'(t))] is given by 


^{t) 



1 

•• ^ ^ 

1 e* 

e2* 


1 

g4t 



^2Kt 



We write T for 'l'(ti). 


Lemma 4. We/ia?;e ||'I'(t)|| < and, with C(t) = e^'p{—2e/{e^ 

Il'hWiI < C-\t)K{K + l)^-^y. 


Proof of Theorem 2. Condition (A2) and the martingale property 
of the ipkiSit)) imply that 


K 

E[Y\Si] = Y.akMSi), 

k=0 

and, thus, that fdk = Ofc, A: = 0,1,..., iL. In this case, the normalization |/3| = 
1 is equivalent to |(ao, ■ • ■, o,k)\ = 1- Applying this in (22) and then applying 
Lemmas 3 and 4 we get 


1 

sup MSE(^)< sup ||4' ^f^E J2^kiS2)iJK{Si) 
l/3|=l l/3|=l ^ U=0 
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< 11^- 


- 1||2 


-(K + l)E[^j;l{S2)^l{Sl)] 


N 




2K 


K + 1 

N ' 


,5K^ti+K^t2 


If we now take K = then as —> oo, 

logf^C{hfK\K + + o(l))| 

= —5 log N + o(log N) ^ —oo, 

which proves the first assertion in the theorem. 

For the second part of the theorem, define 

Y* = ^KW{t2)-K^t2/2 

for which (5* is (0,... ,0,1)^. The corresponding vector 7 * is T/?*, the last 
column of T. Applying this in (23) and using Lemmas 3 and 4 we get 

^ -I > 

N 


sup MSE(/3) > 
1/31=1 


> 


> 


|TP 
1 1 
WPn 




. fe =0 


* \ 2 \ 


k=0 


{E[rK{S2miSi)]-h*K) 


1 


iV(A + l)2e2-^"*i 
1 


(e 


5KH1+KH2 _ 


N{K+iy U + oiijj. 

If we now take K = then as A^ —> 00 , 

proving the second assertion in the theorem. □ 


The analysis of this section differs from the normal setting of Section 3.1 
in that the polynomials (38) are not orthogonal. In the Brownian case, the 
Hermite polynomials are orthogonal and (after appropriate scaling) martin¬ 
gales. In using (38), we have chosen to preserve the martingale property 
rather than orthogonality. As a consequence ||'I'“^|| and l/||'k|| appear in 
our bounds on MSE(/3). From Lemma 4 we see that || has an asymptot¬ 
ically negligible effect on the upper bound for MSE(/9), and with or without 
the factor of I/IITII, the lower bound on MSE(/3) is exponential in a multiple 
of K^. The slower convergence rate in the lognormal setting therefore does 
not appear to result from the lack of orthogonality. 
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4. Multiperiod problem. We now turn to conditions that ensure conver¬ 
gence of the multiperiod algorithm in Section 2.2 as both the number of 
basis functions K and the number of paths N increase. We first formulate a 
general result bounding the error in the estimated continuation values, then 
specialize to the normal and lognormal settings. 

4.1. General bound. We use the following conditions. 

(Bl) E['ijj'^i^{Sn)] and are increasing in n and k. 

As explained in the discussion of the single-period problem, we need some 
normalization on the regression coefficients in order to make meaningful 
statements about worst-case convergence. For a problem with m exercise 
opportunities, we impose 

(B2) \fim-l\ = l. 

This condition is analogous to the one we used in the single-period problem, 
where f3 was a vector of coefficients at time ti and Y was a linear combination 
of functions evaluated at S{t2). 

We also need a condition on the functions hn that determine the payoff 
upon exercise at time tn- The following condition turns out to be convenient: 

(B3) E[hf^{Sn)] < for n = 0,l,...,m. 

Suppose Sn has density gn and define the weighted norm on functions 

||G||n = ^ j G{x)‘^gn{x) dx. 

With Cn the estimated continuation value defined by (12), we analyze the 
error E[||C'„ - Cn\\n]- 

We need some additional notation. Let 

c= max Bk= max ||Tr^||, 

1 tfi 1 

Hk = max{c^, + 1)}, Ak = {K + 1)/7;^E[V^^^(A^)]. 

Under (AO), Bk is well dehned. We can now state the main result of this 
section. 

Theorem 3. If assumptions (AO) and (B1)-(B3) hold, then 

(39) E[||C„ - CJl] < (2”-” - 1) A^Bx.4“(E|^^*-(S„)|)=(1 + o(1)). 

This result is proved in Section 6 . Its consequences will be clearer once 
we illustrate it in the normal and lognormal settings. 
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4.2. Multiperiod examples. 


4.2.1. Normal setting. As in Section 3.1, let 5 be a standard Brownian 
motion and let the V’nfc be as in (25). Each is then the identity matrix, 
n = 1,..., m. It follows that 

(40) Bk = max||'I'“^|| = a/A' + 1. 

Also, 

Hk = max{c'^,i?^(Ar + 1)} = 


for all sufficiently large K. 

To bound E[?/)^;^(S'm)] (which appears in Ak), we use (29) (with ti =t 2 
and ki = k 2 = K) and then Stirling’s formula and Lemma 2 to get 


The expression on the right follows from (32), (36) and (37) upon noting 
that with p = 1 we get a = 2/3 and 6 = 1/3. Substituting this expression and 
(40) into (39) yields 


E[\\Cn-C„ 


< ( 2 ™-” - 1 ) 


{K + l)2m-2n+5/2 / 


N 


It now follows that if 


K = 


\4:V2K^tt^ 

(1 — 6) logN 
(m — n) (2 log 3 + log c) 


\ nt—lb 

^2K\ c(”^-^)^(l + o(l)). 


for some 6 > 0, then 
(41) 


lim sup E[\\Cn-Cn\\l]=0. 


In other words, we have convergence of the estimated continuation values at 
all exercise opportunities, as both N and K increase. If the basis functions 
eventually span the true optimum, in the sense that \\Cn — C*\\n ^ 0 as 
iL —> oo, then by the triangle inequality, (41) holds with Cn replaced by (7*. 

On the other hand, from Theorem 1 we know that if 7L = (1 + 6) log N/cp 
for any 6 > 0, with p = tm/tm-i, then 

lim sup E[\\Cm-i - Cm-i\Wn-i] = oo- 

Thus, the crititcal rate of K for the multiperiod problem is 0{logN), just 
as in the single-period problem. 
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4.2.2. Lognormal setting. Now we take S to be geometric Brownian mo¬ 
tion and use the basis functions of Section 3.2. In this case we have 


Bk = max I 

n 




-ii 


< maxC 

n 


-1 


(4 


- 1 


K-l 


< e 


2e/e*l-l 


3*1 — 1 


K-l 


the first inequality following from Lemma 4, the second following from the 
fact that both C{tn) and achieve their maximum values at n = 1. 

As in Lemma 3, we have =exp(6Ar^tm) and = 

exp(iC^fm)- Making these substitution in Ak and in (39), we get 


(42) 


E[\\Cn-Cn\\l]<{2^---l) 


( a : + i )”"-”+2 


N 




The factor (AT -|- 1)^ is negligible compared to the exponential factor in (42). 
The factors Bk and Hk grow exponentially in AT, but their exponents are 
linear in K, whereas the dominant exponent in (42) is quadratic in AT. Thus, 
Bk and Hk are also negligible for large AT. If we set 


K = 


(1 — (5) log A* 


y (6(m 


u) -|- ‘2')tm 


for any d > 0, then 

lim sup E[\\Cn-Cn\\n]=^- 


On the other hand, we know from Theorem 2 that if 

K = 


^ (1 + 6) log A* 
3tm, T tm—1 


for any d > 0, then 


lim sup EfllCm-i - Cm-illm-i] =oo- 


Thus, the crititcal rate of AT for the multiperiod problem is 0(-v/log A*), just 
as in the single-period problem. 


5. Proofs for the single-period problem. 

5.1. Normal setting. 

Proof of Lemma 1. Equation (28) follows immediately from the or¬ 
thogonality and martingale properties of the Hermite polynomials. Using 
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(24), we get 


) . 


V^- 

= {kl\k2\) ^ ^ {S2/^/h)He^^{Sl/^)\ 


fc = 0«=0 
fcl Afc2 


{k\)‘^{k2-ky.{i\y{ki-iy. 


_ (1 |T n _ (2fc)! (ti/t2) _ 

to if^'-V{k2-ky.{kif{ki-ky 


fcl Afc 2 

= E/^- 

k=0 


_k (2k\ (ki 


k 


k 


The fourth equality applies (28). □ 


Proof of Lemma 2. The ratio between the {k + l)st summand and 
the feth summand in (30) is 


TkK = 


p-(t+l)(2M) ( ^ 2(2t + 1) (/T - k'f 


For Q <k < K — its derivative with respect to k is 

{9,kK{k -K) + 4(fc -K) + lOfc^ - ^kK - 2K^) < 0. 


2 ■ 


p{k + 1)4 


Thus, VkK is strictly decreasing in k. At k = 0, VkK = 4:K‘^/p, which is greater 
than 1 for all sufficiently large K] and at k = K — I, 


rK-i,K = 


2{2K-1) 
pK^ 


which is less than 1 for all K >2. Thus, for all sufficiently large K, k* is 
characterized by the condition 

k* = minj/c: < !}• 

The condition < 1 is equivalent to 

4{K-kf ^2k + 2 
p{k + lf - 2k + I' 


(43) 
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and {2k + 2)1 {2k + 1) is greater than 1 for all positive k. The ratio on the 
left-hand side of (43) is decreasing in k, 0 < k < K — 1, so if we dehne 


(44) 


ki = min 


p{k+i)^ - 


then k* <ki. 

For any fixed k, the inequality in (43) will be violated for all sufficiently 
large K, so k* must increase without bound as K ^ oo. It follows that 
{2k* + 2)/{2k* -|- 1) —> 1. If for some e > 0, we dehne 

then k* > k 2 for all sufficiently large K. Thus, k 2 <k* <ki. 

For ki, we examine the equation 

A{K-k)^ 
p{k + l)^ 

The only root of this equation less than K is 


2 + Vp 


2 -I- y/p 


K{l + o{l)). 


The solution ki to (44) is either [k\ or [fcj -|- 1, so A;i//c ^ 1. 
The same argument applied to the equation 

4{K-k)^ _ 

p(A; + l)2 + ^ 


shows that 


k2 = 


2 + a/ p{l + e) 


K{l + o{l)). 


Noting that we may take e > 0 arbitrarily small and k 2 <k* < ki concludes 
the proof. □ 


5.2. Lognormal setting. 


Proof of Lemma 3. Using the martingale property of ^|Jk{S{t)) and 
the moment generating function of W{ti), we get 


^[i/kAS{ti))i^k,{S{t2))] = E[E[^l;k,{S{ti))i^k,{S{t2))\W{h)]] 

= E['ipki{S{ti))'ipk2{S{ti))] 

— ^Jg(fci+fe2)VF(ii)-feJii/2-fc|t2/2j 
_ gfclfc2tl 
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The second part of the lemma works similarly. □ 


Proof of Lemma 4. The first assertion follows from the observation 
that the largest entry of 'k is For the second assertion, we note that 

T has the form of a Vandermonde matrix, allowing calculation of its deter¬ 
minant (using [12], page 322), 

(45) det^= n 

0<q<r<K 


By standard linear algebra, the inverse of T is given by 


(46) 

where 


^-1 


\I/* 

det'k’ 


T*, = (-!)''+'■ det4^(g|r), 

and 'k(( 7 |r) denotes the matrix obtained by deleting the gth row and rth 
column from 'k. Two cases arise, depending on whether q = r = 1 or not. 


Case 1. g/lorr/1 . Since 4^ is symmetric, det4'(g|r) = det'k(r|g), so 
it suffices to suppose r / 1. We can then compute the determinant of 'k(g|r) 
using [12], page 333. Through (46) this leads to 

the sum ranging over si,..., taking values in {0,..., K}. 


The lemma requires an upper bound on the numerator and a lower bound 


on the denominator. To bound the numerator, for r = 1, 


,K — 1, set 


R{K,q,f)= expi^Sdt 

We now claim that 


(48) 


R{K, q, f) < R{K, 1, f) < 


([gi _ l)rgf(r-l)t/2 


for f = 1,... ,K — 1. That R{K, q, r) < R{K, 1, f) is immediate from the defi¬ 
nition of R{K, q, r). The second inequality is proved by induction in r. When 
f = 1, 


/?(iL,l,r) = (e* + --- + e^*)< 


p{K+l)t 
Kt\ ^ 


e* — 1 
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Then 


R{K,l,f + l)= ^ expj^Srfij 

K—rK—r+1 K (r-\-l 

= Y. Y. ••• Y 

il=l 12=11+1 if.+i=f+l li=l 

ii=l 


(49) 

Thus, (48) holds. 
The fact that 


K-f 


<Y 



^r(K+l)t 


g(A+l-f)t gf(A+l)t 

_ ]^^f+lgr(f+l)t /2 * 


d / \ ^ 

^ V(e* - l)^e^(^-i )‘/2 j 


implies that (49) achieves its maximum when f = K — 1. Thus, 

9; 1^) < ^-gt _ 2)A-lg(A-l)(A-2)t/2 ’ 

for g = 1 ,..., iC + 1 , r = 2 ,..., iC + 1 . 

Next, we show that the denominator of (47) is bounded below by C{t) exp(iC x 
{K + l)t/2), with C{t) = exp(—2e/(e* — 1)^). For this, we rewrite the denom¬ 
inator of (47) as 

J](e('i-i)t_g.t)^(gii_gte-i)i) 

j=0 j=q 


q-2 


j=0 


Ji 




K 


3=<1 


;(<?-l)i 

eJt 


r, -^K 1 “! / 1 \ K—q+1 


n (i 

i=i^ ^ j=i 


=(g-i)i 


(51) 
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> e 


K(K+l)t/2+{q-l){q-2)/2 


K 


n ‘ 


1 

eJt 


n 1 

i=i ^ 


1 

eii 


K 




i=i 

Taking the logarithm of the product over j and applying a Taylor expansion 
yields terms of the form 


log 1 - 


1 

eh 


1 


1 


efi 2e2h 


1 


ne"'h 


^ 1 


1 


1 


Therefore, 

and 
(52) 


K 


f=l 


gh e(i+i)t 
1 e* 

git gt _ 1 • 

=t 


g(n-l+i)t 




—e 


e* — 1 eh (e* — 1 )' 


:(l + o(l)) 


A 


n(i-T) 

i=i 


Finally, by (51) and (52), we get that the denominator of (47) is bounded 
below by 

(53) g-2e/(e‘-l)2gA(A+l)t/2_ 

Applying this lower bound and (50) to (47), we get 

(54) 4.-1 <,=•/(•-■) (_^ =c->(«)(^) 

for g,r not both equal to 1 . 

Case 2. 5 = 1 and r = 1. Because = I and all entries of the hrst 

row of ^ are 1 , we have 


(55) |Tri'| = 


A+l 


1 - E 'prl 


r=2 


A +1 

<i+^i4tr,'i<c'-i(f)i^ 

r=2 


J \ K 


=t _ 1 


Combining (54) and (55) we get 


< c-Hm +1)7^ 


q,r 


i \ K-l 


e — 1 


□ 
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6. Proofs for the multiperiod problem. As a tool for proving Theorem 3, 
we introduce a second sequence of coefficient estimates f3n and 7 ^. At each 
n, f5n is the vector of coefficients that would be obtained using the algorithm 
of Section 2.2 if the coefficients /3n+i were known exactly. More explicitly, 

with Vn+i as in (10). The distinction between this and Step 2 of the algo¬ 
rithm is that here Vn+i uses the true coefficients f3n [as in (9)], whereas Vn+i 
in (13) uses the estimated coefficients j3n+i- The estimates /?„ and 7 ^ are 
not computable in practice and are simply used as a device for the proof. 
From the coefficients jSn define 

K 

^ [ Pnk'4^nki,x') — ((x). 

fc =0 

Thus, Cn results from applying the estimated operator to the exact 
function Cn+i, whereas Cn results from applying the estimated operator to 
the estimated function Cn+i- 

The proof of Theorem 3 also relies on two lemmas. 

Lemma 5. Under conditions (AO) and (B1)-(B3), 

l7„.-nl" < (2ffKE[7„K(S„)l)"-'(K + l)'‘+‘(E[V'^/r(S™)l)"(l + o(l)) 

for n = 1 ,..., m — 1 . 

Proof. First note that for any x 

Cl{x) = {i)l{x)^n^-inf < |V'n(x)P||T-^f |7„p. 

By the definition of 7 , together with the fact |max{a, 6 }| < \a\ + | 6 |, we get 

— I E['0ri/c(5*72,) niax{fi77_|_i (5*72-j-l)5 Cn+l ( 5 * 72 + 1 )}] I 

< E[\'ll;nk{Sn)hn+l{Sn+l)\] + E[|(+)(‘S'n+l)^7l7n+lI] 

<^E[V^2^(+)]E[h7i(5„+i)] 

+ ||^7i|||7n+l7E[V^7(50|V'n+l(5n+l)P] 

< Vc^E[V’^^( 5™)] +B^|7„+i7(iF + 1)^E[V>^^(5^)]. 
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The last inequality uses (Bl), (B3) and the inequality E[/i^] < a/E[/i^]. Thus, 

k=Q 

(56) < 2{K + l)E[V^^^(5^)](c^ + Bl{K + l)|7n+iP) 

< 2{K + + |7n+iP), 

with Hk = max{c^, + 1)} as defined in Section 4.1. 

Conditions (Bl) and (B2) imply that 

I7r.-i|" < ll'I^m-1 f <{K + 

Then (56) gives 

|7™_2|2 < 2{K + l)E.[iPt,K{Sm)]HK{l + I7m-l|") 

= 2HK^[i^U{Smm + l)3(Efc(5,,)])2(l + 0(1)), 

|7^_3|2 < 2(iG + 1)E[V'^^(5™)]/7^(1 + \-im-2?) 

= {2HKE[i;U{Sm)]f{K + l)"(Efc(S^)])2(l + o(l)) 
and, proceeding by induction, completes the proof. □ 

Lemma 6. Under conditions (AO) and (B1)-(B3), 

m—n 

mCn-Cn\\l]<BK E 
1=1 


Proof. By the definition of (7,(7 and C and the triangle inequality, we 
have 

mCn - CnWl] = mLnCn+1 - L^Cn+lWl] 

— E[||LnCn,^i LjiCn+l\\n T ||An(7n+l ||yj]. 


Now, 


I-'nCn+l LjiCn-\-l — Ip^ 4*^ {'Jn 7n.) 


SO 


|AnC*n+l AnC'n+lll 
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The same bound holds with LnCn+i replaced by LnCn+i and 7 ^ replaced 
by 7 n. Thus, 

(57) E[||a - CnWl] < BK{mn - 7nP] + E[|7n " In?])- 

Using the definitions of 7 „ and and the inequality | max{a, b} — max{a, c} | < 
\b — c\, we get 




N 


(i) 




N 


2 = 1 


- max{/in+i(5^*|i),C'„+i(5^‘|i)}| 


(58) 


< 


1 


N 


n+1) 

2 


^■^|^„fc(5«)| 14+1(5^ 1 ) - a+i(5« ,)l 


1 ^ 

< ]^EV’nfc(5i‘^)(C'n+l(5W 1 ) - a+i(5iE))'- 
2 = 1 

The paths 5^, i = in this expression are independent of the coef¬ 

ficients of Cn+i (see Step 2 of the algorithm), so 

(59) E[{%j, - %j,f] = E[^|;lf,{Sn)iCn+l{Sn+l) - Cn+l{Sn+l))% 

with {Sn,Sn+i) independent of the coefficients of Cn+i- 
To bound (59), we use 

(Cn+l(5’n+l) - Cn+liSn+l))'^ = (V’El('^'n+O^El(^n+l “ 7n+l))^ 

— l^n+l('S’n+l)| ||il^n-|-lll l7n-|-l ~ 7n-|-l| • 
The independence of {Sn,Sn+i) and 7^+1 then gives 

E[^l,{Sn){Cn+l{Sn+l) - Cn+l{Sn+l)f] 

< ||T;j,||2E[V>^,(5„)|V>n+l(5n+l)|']E[|W -7n+lP] 

< Bl{K + l)E[i;lKiSn)i’l+l,K{Sn+im\%+l - 7n+l|'] 

< Bj,{K + 1)^E[V>^^(5„)]E[V^4^,^^(5„+i)]E[|%+i - 7„+i|2] 

< i?i(K + l)E[V>ix(5m)]E[| W - 7n+lp], 

the last inequality following from (Bl). Using this bound with (58) and (59), 
we get 

K 

E[|7n - 7n|^] = E E[(7n,fc “ %,kf\ 

/c=0 
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<{K + lfBlE.[ll;U{Sm)]mn+l - In+l?] 

(60) < AxE[| 7 n+l - 7 n+l|^] 

(61) < AxE[|7n+l -7n+l|^] +Ak^\%+ 1 -ln+1?]- 
By iteratively using (60)-(61), we get 

mn-ln?] 

m—n—1 

<^r”-'E[|7m-i-7m-iP]+ E ^r”-'E[|7m-z 

1=2 

m—n—1 

= ^r"-'E[|7m-l-7m-lP]+ E ^r"-'E[|7^_z 

1=2 

m—n—1 

= E ^r”-'E[|7m-i-7m-/P], 

1=1 

because 7 m-i = 7 m-i (since Cm = Cm = 0). Using this bound in (57) con¬ 
cludes the proof. □ 


-7m-«P] 
'Im—l I ] 


Proof of Theorem 3. Because each ^nk is an unbiased estimate of 
the corresponding 7 ^^, E[{^nk — 7nfc)^] is the variance of ^nk and is therefore 
bounded above by the second moment of 7 ^^. Thus, 

E[|7m—n. 7m—nl ] 


K 

— ^ ( E[(7m—n,fc '^m—n,k) ] 

/c=0 

^ 1 2 2 2 

— E/ ~j^^\'^rn-n,k{^rn-n)'kll^l^{hm-ri+l{^rn-n+l)-,Cm-n+l{^rn-n+l)Y\ 
k=0 


K 


— E jy^bPm-n,ki^rn-n){hm-n+li^rn-n+l) + Cj^_^_^i{Sm-n+l))] 


fc =0 

K 


— jiq^V^rn-n,k{^rn-n)hm-n+l{^rn-n+l)\ 


k=0 


K 1 

+ E '^^\''Prn-n.k{^m-n)\\^m-n\\ \lm-n\ |V’m-n+1 (5'm-n+l)| ]• 
fc =0 


(62) 
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For the first term in (62) we use the Cauchy-Schwarz inequality, (Bl) and 
(B3) to get 

^ 1 2 2 
k=0 

( 63 ) < ^jE['lp^j^{Sm)]^[hfn-n+li^m-n+l)] 

<^HKE[i;UiSm)]. 

For the second term in (62) we again use Cauchy-Schwarz and (Bl) to get 


K 


T,T^['^m—nMi^rn—n)\\"^rn—n\\ |7m-n| |V’m—n+1 ('S'm—n+l)| 


(64) 


k=0 


<^^HKE[^l;tKiSm)]hm-n\^■ 


N 


Combining (62)-(64) and Lemma 5 we arrive at 
E[|7m—n ^m—n\ ] 

^ (X^^2-1(^^E[V^^^(5™)])"(E[V>^^(5™)])2(1 + o(1)) 


N 

2”-i(iC + l)2 
N 


(l+o(l))- 


By Lemma 6, we now get 

E[\\Cr,-Cn\\l] 


C m—n \ 

X (l + 2 + ... + 2’"-“-‘){l+o(l)) 

= ( 2 ”“-” - + <,{!)), 
which concludes the proof. □ 


7. Concluding remarks. It is natural to ask to what extent our results 
depend on the fact that the basis functions we consider are polynomials. 
Some insight into this question can be gleaned from the analysis of the lower 
bound on MSE(/9) in the proof of Theorem 1. The lower bound results from 
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choosing V = aKil^ 2 K{S 2 ) and its growth is driven by the second moment 
a'j^E['ip2Ki^‘2)'4’iKi^i)]- With ipiK orthogonal to the other basis functions at 
ti, the condition \f3\ = 1 translates to ax = 1/E[V’2A'('S'2)'0ia'('S'i)]- Thus, the 
growth of the lower bound is driven by the growth of the ratio 

{E[^P2K{S2)i^lK{SlW 

as K increases. A few examples show that this ratio does indeed grow with 
K even for choices of functions that grow much less quickly than poly¬ 
nomials. In the case of Brownian motion, explicit calculations show that 
for ipjKix) = l{x > K}, the ratio is 0(i^exp(Ar^/2ti)) and for 'ipjxix) = 
max{0,a: — K}, the ratio is 0(Ar^exp(Ar^/2ti)), so in both of these cases 
the growth rate is even faster than for the polynomials in Theorem 1. 
With ipjKix) =x^exp(—x), numerical calculations indicate that the ratio 
is roughly linear in K (thus requiring roughly linear growth of N), but its 
magnitude is very large even at small values of K. These simple illustra¬ 
tions suggest that the phenomena observed in this paper may occur more 
generally. But see [7] for more positive results using bounded basis functions. 
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